A Unified Fuzzy Data Model: Representation and Processing

نویسندگان

  • Avichai Meged
  • Roy Gelbard
چکیده

A novel fuzzy data representation model which enables data mining with standard tools is introduced. Many data elements in the world are fuzzy in nature. There is an obvious need to represent and process such data effectively and efficiently, using the same standard tools for crisp data that are popular with researchers and practitioners alike. Currently, however, standard tools cannot process or analyze data that are not adequately represented. The comprehensive data representation model put forward here extends principles of binary databases and provides a unified approach to all types of data: discrete and continuous, crisp and fuzzy. The model is illustrated on a baseline dataset and tested in clustering experiments matched against controlled groupings and a real dataset. The tests confirm that the implementation of the model not only enables the use of standard tools but also yields better results as regards segmentation and clustering of fuzzy datasets. semantic models described in the background section, are unsuitable as input to standard clustering and mining tools which require one flat file matrix-like format such as a relational database table where each cell contains a single crisp value. The current paper proposes a method that enables the use of standard clustering and mining tools on fuzzy data. Since clustering has become increasingly popular as a data mining technique (Giannotti & Pedreschi, 2008; Manying, 2007) we concentrate on showing how our fuzzy data model applies to clustering. Clustering is crucial in the social sciences, marketing, finance, computer science, biology, medicine and elsewhere. Standard data analysis and mining tools such as SPSS, SAS or Clementine implement widely available clustering algorithms methods, and therefore are preferable DOI: 10.4018/jdm.2012010104 Journal of Database Management, 23(1), 78-102, January-March 2012 79 Copyright © 2012, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. over proprietary tools (for a brief description of clustering issues see Estivill-Castro & Yang, 2004; Gan, Ma, & Wu, 2007; Jain & Dubes, 1988; Jain, Murty, & Flynn, 1999; Lim, Loh, & Shih, 2000; Zhang & Srihari, 2004). There is an ongoing effort to devise a data model to resolve the discrepancy between the format used to store the database and the representation format demanded by clustering algorithms (Ryu & Eick, 2005). The few studies that have dealt with clustering of fuzzy data have been partially successful but the clustering methods were restricted to specific fuzzy data types and used dedicated and proprietary algorithms as described in the background section. The current paper proposes a unified data representation model that can deal with both crisp and fuzzy data, thus enabling the use of standard clustering and mining tools. The model draws on principles of binary representation employed in commercial databases, motion databases and fuzzy databases (Gelbard & Meged, 2008; Gelbard & Spiegler, 2002; Spiegler & Maayan, 1985). In these binary database models, the data are represented in a matrix where the rows stand for the database entities and the columns stand for different attribute values. In the proposed model, matrix cells are numbers that indicate degrees of attribute value similarity to the “right value”. These similarity numbers are derived from fuzzy database models such as the possibility distribution model (Prade & Testemale, 1984) and the proximity-based fuzzy relational model (Shenoi & Melton, 1989). The model is described and illustrated on a simple baseline dataset. It is tested by a controlled clustering experiment using SPSS software.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A bi-objective model for a scheduling problem of unrelated parallel batch processing machines with fuzzy parameters by two fuzzy multi-objective meta-heuristics

This paper considers a bi-objective model for a scheduling problem of unrelated parallel batch processing machines to minimize the makespan and maximum tardiness, simultaneously. Each job has a specific size and the data corresponding to its ready time, due date and processing time-dependent machine are uncertain and determined by trapezoidal fuzzy numbers. Each machine has a specific capacity,...

متن کامل

A UNIFIED MODEL FOR RESOURCE-CONSTRAINED PROJECT SCHEDULING PROBLEM WITH UNCERTAIN ACTIVITY DURATIONS

In this paper we present a unified (probabilistic/possibilistic) model for resource-constrained project scheduling problem (RCPSP) with uncertain activity durations and a concept of a heuristic approach connected to the theoretical model. It is shown that the uncertainty management can be built into any heuristic algorithm developed to solve RCPSP with deterministic activity durations. The esse...

متن کامل

A Logic of Knowledge Integrity

In a unified knowledge representation, data, information and knowledge are all represented in a single formalism as “items”. A unified knowledge representation is extended here to include two types of fuzzy measures of knowledge integrity. These integrity measures define a graduated integrity region for data, information and knowledge. This fuzzy region contains knowledge of increasingly questi...

متن کامل

Palarimetric Synthetic Aperture Radar Image Classification using Bag of Visual Words Algorithm

Land cover is defined as the physical material of the surface of the earth, including different vegetation covers, bare soil, water surface, various urban areas, etc. Land cover and its changes are very important and influential on the Earth and life of living organisms, especially human beings. Land cover change monitoring is important for protecting the ecosystem, forests, farmland, open spac...

متن کامل

Incremental Granular Fuzzy Modeling Using Imprecise Data Streams

System modeling in dynamic environments needs processing of streams 1 of sensor data and incremental learning algorithms. This paper suggests an incre2 mental granular fuzzy rule-based modeling approach using streams of fuzzy inter3 val data. Incremental granular modeling is an adaptivemodeling framework that uses 4 fuzzy granular data that originate from unreliable sensors, imprecise perceptio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • J. Database Manag.

دوره 23  شماره 

صفحات  -

تاریخ انتشار 2012